Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 2247 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 158.1 KiB |
| Average record size in memory | 72.1 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 8 |
relative_humidity is highly correlated with absolute_humidity | High correlation |
absolute_humidity is highly correlated with relative_humidity and 2 other fields | High correlation |
sensor_1 is highly correlated with sensor_2 and 3 other fields | High correlation |
sensor_2 is highly correlated with sensor_1 and 3 other fields | High correlation |
sensor_3 is highly correlated with absolute_humidity and 4 other fields | High correlation |
sensor_4 is highly correlated with absolute_humidity and 4 other fields | High correlation |
sensor_5 is highly correlated with sensor_1 and 3 other fields | High correlation |
relative_humidity is highly correlated with absolute_humidity | High correlation |
absolute_humidity is highly correlated with relative_humidity and 2 other fields | High correlation |
sensor_1 is highly correlated with sensor_2 and 3 other fields | High correlation |
sensor_2 is highly correlated with sensor_1 and 3 other fields | High correlation |
sensor_3 is highly correlated with absolute_humidity and 4 other fields | High correlation |
sensor_4 is highly correlated with absolute_humidity and 4 other fields | High correlation |
sensor_5 is highly correlated with sensor_1 and 3 other fields | High correlation |
absolute_humidity is highly correlated with sensor_4 | High correlation |
sensor_1 is highly correlated with sensor_2 and 3 other fields | High correlation |
sensor_2 is highly correlated with sensor_1 and 3 other fields | High correlation |
sensor_3 is highly correlated with sensor_1 and 3 other fields | High correlation |
sensor_4 is highly correlated with absolute_humidity and 4 other fields | High correlation |
sensor_5 is highly correlated with sensor_1 and 3 other fields | High correlation |
deg_C is highly correlated with relative_humidity and 5 other fields | High correlation |
relative_humidity is highly correlated with deg_C and 2 other fields | High correlation |
absolute_humidity is highly correlated with deg_C and 4 other fields | High correlation |
sensor_1 is highly correlated with deg_C and 4 other fields | High correlation |
sensor_2 is highly correlated with deg_C and 5 other fields | High correlation |
sensor_3 is highly correlated with deg_C and 5 other fields | High correlation |
sensor_4 is highly correlated with deg_C and 6 other fields | High correlation |
sensor_5 is highly correlated with sensor_1 and 3 other fields | High correlation |
date_time has unique values | Unique |
Reproduction
| Analysis started | 2021-10-05 10:16:00.992789 |
|---|---|
| Analysis finished | 2021-10-05 10:16:10.994515 |
| Duration | 10 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 2247 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.7 KiB |
| Minimum | 2011-01-01 00:00:00 |
|---|---|
| Maximum | 2011-04-04 14:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 280 |
|---|---|
| Distinct (%) | 12.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.80814419 |
| Minimum | -1.8 |
|---|---|
| Maximum | 30.9 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 13 |
| Negative (%) | 0.6% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | -1.8 |
|---|---|
| 5-th percentile | 2.9 |
| Q1 | 5.6 |
| median | 9.8 |
| Q3 | 14.2 |
| 95-th percentile | 23.7 |
| Maximum | 30.9 |
| Range | 32.7 |
| Interquartile range (IQR) | 8.6 |
Descriptive statistics
| Standard deviation | 6.44449695 |
|---|---|
| Coefficient of variation (CV) | 0.5962630435 |
| Kurtosis | -0.2825219874 |
| Mean | 10.80814419 |
| Median Absolute Deviation (MAD) | 4.3 |
| Skewness | 0.6772607799 |
| Sum | 24285.9 |
| Variance | 41.53154094 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 4.2 | 30 | 1.3% |
| 4.1 | 27 | 1.2% |
| 3.9 | 25 | 1.1% |
| 4.6 | 25 | 1.1% |
| 3.4 | 23 | 1.0% |
| 6 | 22 | 1.0% |
| 6.7 | 22 | 1.0% |
| 4.7 | 21 | 0.9% |
| 6.5 | 21 | 0.9% |
| 5.7 | 20 | 0.9% |
| Other values (270) | 2011 |
| Value | Count | Frequency (%) |
| -1.8 | 1 | |
| -1.3 | 2 | |
| -1.2 | 2 | |
| -1.1 | 1 | |
| -0.6 | 2 | |
| -0.5 | 1 | |
| -0.3 | 1 | |
| -0.2 | 1 | |
| -0.1 | 2 | |
| 0 | 1 |
| Value | Count | Frequency (%) |
| 30.9 | 1 | |
| 30.3 | 1 | |
| 29.6 | 1 | |
| 29.4 | 1 | |
| 29.1 | 1 | |
| 28.2 | 2 | |
| 27.9 | 2 | |
| 27.5 | 1 | |
| 27.3 | 1 | |
| 27.2 | 2 |
| Distinct | 653 |
|---|---|
| Distinct (%) | 29.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51.03124166 |
| Minimum | 9.8 |
|---|---|
| Maximum | 88.8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 9.8 |
|---|---|
| 5-th percentile | 24.93 |
| Q1 | 36.9 |
| median | 50.6 |
| Q3 | 63.55 |
| 95-th percentile | 78.6 |
| Maximum | 88.8 |
| Range | 79 |
| Interquartile range (IQR) | 26.65 |
Descriptive statistics
| Standard deviation | 16.66504715 |
|---|---|
| Coefficient of variation (CV) | 0.3265655823 |
| Kurtosis | -0.8112609995 |
| Mean | 51.03124166 |
| Median Absolute Deviation (MAD) | 13.4 |
| Skewness | 0.05956760129 |
| Sum | 114667.2 |
| Variance | 277.7237965 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 35.6 | 13 | 0.6% |
| 53.4 | 11 | 0.5% |
| 70.2 | 10 | 0.4% |
| 56 | 10 | 0.4% |
| 51 | 10 | 0.4% |
| 61.7 | 10 | 0.4% |
| 34.1 | 10 | 0.4% |
| 31.7 | 10 | 0.4% |
| 42.7 | 9 | 0.4% |
| 76.6 | 9 | 0.4% |
| Other values (643) | 2145 |
| Value | Count | Frequency (%) |
| 9.8 | 1 | |
| 10.4 | 1 | |
| 10.7 | 1 | |
| 12.7 | 1 | |
| 12.8 | 1 | |
| 13.2 | 1 | |
| 13.3 | 1 | |
| 13.5 | 1 | |
| 13.9 | 1 | |
| 14.2 | 1 |
| Value | Count | Frequency (%) |
| 88.8 | 1 | |
| 88.3 | 2 | |
| 88.1 | 1 | |
| 88 | 1 | |
| 87 | 2 | |
| 86.9 | 1 | |
| 86.4 | 1 | |
| 86.2 | 1 | |
| 86.1 | 1 | |
| 85.8 | 2 |
absolute_humidity
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1915 |
|---|---|
| Distinct (%) | 85.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6270527815 |
| Minimum | 0.1847 |
|---|---|
| Maximum | 1.393 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 0.1847 |
|---|---|
| 5-th percentile | 0.23283 |
| Q1 | 0.41335 |
| median | 0.5964 |
| Q3 | 0.80495 |
| 95-th percentile | 1.10381 |
| Maximum | 1.393 |
| Range | 1.2083 |
| Interquartile range (IQR) | 0.3916 |
Descriptive statistics
| Standard deviation | 0.266588167 |
|---|---|
| Coefficient of variation (CV) | 0.4251447006 |
| Kurtosis | -0.5243672511 |
| Mean | 0.6270527815 |
| Median Absolute Deviation (MAD) | 0.1908 |
| Skewness | 0.4674074027 |
| Sum | 1408.9876 |
| Variance | 0.07106925079 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.2345 | 4 | 0.2% |
| 0.2324 | 4 | 0.2% |
| 0.7238 | 4 | 0.2% |
| 0.2353 | 4 | 0.2% |
| 0.2302 | 4 | 0.2% |
| 0.2303 | 4 | 0.2% |
| 0.6826 | 4 | 0.2% |
| 0.2322 | 4 | 0.2% |
| 0.2259 | 3 | 0.1% |
| 0.5458 | 3 | 0.1% |
| Other values (1905) | 2209 |
| Value | Count | Frequency (%) |
| 0.1847 | 1 | |
| 0.1862 | 1 | |
| 0.191 | 1 | |
| 0.1975 | 1 | |
| 0.2031 | 1 | |
| 0.2062 | 1 | |
| 0.2086 | 1 | |
| 0.2157 | 1 | |
| 0.2202 | 1 | |
| 0.221 | 1 |
| Value | Count | Frequency (%) |
| 1.393 | 1 | |
| 1.3838 | 1 | |
| 1.3342 | 1 | |
| 1.3224 | 2 | |
| 1.3219 | 1 | |
| 1.3208 | 1 | |
| 1.3182 | 1 | |
| 1.3171 | 1 | |
| 1.3154 | 1 | |
| 1.3135 | 1 |
| Distinct | 1758 |
|---|---|
| Distinct (%) | 78.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1106.53449 |
| Minimum | 665.9 |
|---|---|
| Maximum | 1882.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 665.9 |
|---|---|
| 5-th percentile | 829.08 |
| Q1 | 951.5 |
| median | 1080.4 |
| Q3 | 1222.1 |
| 95-th percentile | 1500.84 |
| Maximum | 1882.9 |
| Range | 1217 |
| Interquartile range (IQR) | 270.6 |
Descriptive statistics
| Standard deviation | 205.341455 |
|---|---|
| Coefficient of variation (CV) | 0.1855716716 |
| Kurtosis | 0.3067191872 |
| Mean | 1106.53449 |
| Median Absolute Deviation (MAD) | 133.4 |
| Skewness | 0.7197817369 |
| Sum | 2486383 |
| Variance | 42165.11315 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 936 | 5 | 0.2% |
| 1117.2 | 4 | 0.2% |
| 1000.6 | 4 | 0.2% |
| 952.8 | 4 | 0.2% |
| 909 | 4 | 0.2% |
| 1068.9 | 4 | 0.2% |
| 899.6 | 4 | 0.2% |
| 1025.9 | 4 | 0.2% |
| 1134.6 | 4 | 0.2% |
| 1295.3 | 4 | 0.2% |
| Other values (1748) | 2206 |
| Value | Count | Frequency (%) |
| 665.9 | 1 | |
| 709.1 | 1 | |
| 713.5 | 1 | |
| 719.2 | 1 | |
| 722.2 | 1 | |
| 727.6 | 2 | |
| 727.7 | 2 | |
| 732.8 | 1 | |
| 736.6 | 1 | |
| 739.8 | 1 |
| Value | Count | Frequency (%) |
| 1882.9 | 1 | |
| 1842.8 | 1 | |
| 1838.6 | 1 | |
| 1822 | 1 | |
| 1815.8 | 1 | |
| 1797.6 | 1 | |
| 1795 | 1 | |
| 1783.7 | 1 | |
| 1780.8 | 1 | |
| 1762.8 | 1 |
| Distinct | 1816 |
|---|---|
| Distinct (%) | 80.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 836.4597686 |
| Minimum | 356.2 |
|---|---|
| Maximum | 1776.1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 356.2 |
|---|---|
| 5-th percentile | 401.89 |
| Q1 | 640.7 |
| median | 800.8 |
| Q3 | 1016.1 |
| 95-th percentile | 1327.16 |
| Maximum | 1776.1 |
| Range | 1419.9 |
| Interquartile range (IQR) | 375.4 |
Descriptive statistics
| Standard deviation | 272.8165854 |
|---|---|
| Coefficient of variation (CV) | 0.3261562548 |
| Kurtosis | -0.2489792653 |
| Mean | 836.4597686 |
| Median Absolute Deviation (MAD) | 185.4 |
| Skewness | 0.448121094 |
| Sum | 1879525.1 |
| Variance | 74428.88926 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 377.5 | 10 | 0.4% |
| 401.1 | 8 | 0.4% |
| 373.5 | 7 | 0.3% |
| 408.9 | 6 | 0.3% |
| 416.8 | 5 | 0.2% |
| 774.2 | 5 | 0.2% |
| 688 | 4 | 0.2% |
| 691.8 | 4 | 0.2% |
| 913.9 | 4 | 0.2% |
| 412.9 | 4 | 0.2% |
| Other values (1806) | 2190 |
| Value | Count | Frequency (%) |
| 356.2 | 1 | < 0.1% |
| 364.3 | 1 | < 0.1% |
| 364.5 | 1 | < 0.1% |
| 364.7 | 1 | < 0.1% |
| 365.2 | 2 | |
| 365.5 | 3 | |
| 365.7 | 2 | |
| 365.8 | 1 | < 0.1% |
| 368.3 | 1 | < 0.1% |
| 368.4 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1776.1 | 1 | |
| 1746.2 | 1 | |
| 1731 | 1 | |
| 1677.8 | 1 | |
| 1664.2 | 1 | |
| 1636.6 | 1 | |
| 1624.8 | 1 | |
| 1621 | 1 | |
| 1618.6 | 1 | |
| 1579 | 1 |
| Distinct | 1833 |
|---|---|
| Distinct (%) | 81.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 828.3214953 |
| Minimum | 320.1 |
|---|---|
| Maximum | 1975 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 320.1 |
|---|---|
| 5-th percentile | 447.43 |
| Q1 | 597.05 |
| median | 757.1 |
| Q3 | 944.95 |
| 95-th percentile | 1737.6 |
| Maximum | 1975 |
| Range | 1654.9 |
| Interquartile range (IQR) | 347.9 |
Descriptive statistics
| Standard deviation | 339.5117785 |
|---|---|
| Coefficient of variation (CV) | 0.4098792322 |
| Kurtosis | 2.219109259 |
| Mean | 828.3214953 |
| Median Absolute Deviation (MAD) | 169.3 |
| Skewness | 1.512811922 |
| Sum | 1861238.4 |
| Variance | 115268.2478 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 666.4 | 6 | 0.3% |
| 750.5 | 5 | 0.2% |
| 824.7 | 5 | 0.2% |
| 842.5 | 4 | 0.2% |
| 587 | 4 | 0.2% |
| 702 | 4 | 0.2% |
| 538.6 | 4 | 0.2% |
| 452 | 4 | 0.2% |
| 776.2 | 4 | 0.2% |
| 719.1 | 4 | 0.2% |
| Other values (1823) | 2203 |
| Value | Count | Frequency (%) |
| 320.1 | 1 | |
| 325.2 | 1 | |
| 344.6 | 1 | |
| 351.9 | 1 | |
| 354.3 | 1 | |
| 356 | 1 | |
| 359 | 1 | |
| 360 | 1 | |
| 366 | 1 | |
| 366.3 | 2 |
| Value | Count | Frequency (%) |
| 1975 | 1 | |
| 1953.1 | 1 | |
| 1952.4 | 1 | |
| 1940.1 | 1 | |
| 1940 | 1 | |
| 1938.4 | 1 | |
| 1937.5 | 1 | |
| 1923.2 | 1 | |
| 1921.7 | 1 | |
| 1921 | 2 |
| Distinct | 1877 |
|---|---|
| Distinct (%) | 83.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1104.850601 |
| Minimum | 523.4 |
|---|---|
| Maximum | 2211.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 523.4 |
|---|---|
| 5-th percentile | 623.93 |
| Q1 | 899.45 |
| median | 1076.2 |
| Q3 | 1288.35 |
| 95-th percentile | 1620.22 |
| Maximum | 2211.4 |
| Range | 1688 |
| Interquartile range (IQR) | 388.9 |
Descriptive statistics
| Standard deviation | 293.1122248 |
|---|---|
| Coefficient of variation (CV) | 0.2652958007 |
| Kurtosis | -0.01266502103 |
| Mean | 1104.850601 |
| Median Absolute Deviation (MAD) | 193.6 |
| Skewness | 0.4236120482 |
| Sum | 2482599.3 |
| Variance | 85914.77634 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 989.8 | 4 | 0.2% |
| 893 | 4 | 0.2% |
| 993.2 | 4 | 0.2% |
| 956.5 | 4 | 0.2% |
| 1226.1 | 4 | 0.2% |
| 1030.2 | 4 | 0.2% |
| 1519 | 3 | 0.1% |
| 896.8 | 3 | 0.1% |
| 635.6 | 3 | 0.1% |
| 1018.4 | 3 | 0.1% |
| Other values (1867) | 2211 |
| Value | Count | Frequency (%) |
| 523.4 | 1 | |
| 559.7 | 1 | |
| 560.8 | 1 | |
| 561.6 | 1 | |
| 563.2 | 1 | |
| 563.3 | 1 | |
| 563.6 | 1 | |
| 564 | 1 | |
| 564.5 | 1 | |
| 564.6 | 2 |
| Value | Count | Frequency (%) |
| 2211.4 | 1 | |
| 2185.9 | 1 | |
| 2073.8 | 1 | |
| 2068.6 | 1 | |
| 2065.2 | 1 | |
| 2025 | 1 | |
| 2016.1 | 1 | |
| 2004.5 | 1 | |
| 2003.4 | 1 | |
| 1989.8 | 1 |
| Distinct | 2017 |
|---|---|
| Distinct (%) | 89.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1029.851535 |
| Minimum | 218.8 |
|---|---|
| Maximum | 2593.8 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 17.7 KiB |
Quantile statistics
| Minimum | 218.8 |
|---|---|
| 5-th percentile | 402.85 |
| Q1 | 688.55 |
| median | 973.1 |
| Q3 | 1324 |
| 95-th percentile | 1812.11 |
| Maximum | 2593.8 |
| Range | 2375 |
| Interquartile range (IQR) | 635.45 |
Descriptive statistics
| Standard deviation | 434.863287 |
|---|---|
| Coefficient of variation (CV) | 0.4222582305 |
| Kurtosis | -0.2189117863 |
| Mean | 1029.851535 |
| Median Absolute Deviation (MAD) | 311.6 |
| Skewness | 0.5342250443 |
| Sum | 2314076.4 |
| Variance | 189106.0784 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1153.6 | 3 | 0.1% |
| 585.9 | 3 | 0.1% |
| 1043.5 | 3 | 0.1% |
| 557.8 | 3 | 0.1% |
| 796.1 | 3 | 0.1% |
| 549.1 | 3 | 0.1% |
| 1521 | 3 | 0.1% |
| 910.4 | 3 | 0.1% |
| 661.5 | 3 | 0.1% |
| 1646.4 | 3 | 0.1% |
| Other values (2007) | 2217 |
| Value | Count | Frequency (%) |
| 218.8 | 1 | |
| 222.5 | 1 | |
| 225 | 1 | |
| 229.7 | 1 | |
| 245.4 | 1 | |
| 251.9 | 1 | |
| 253.2 | 1 | |
| 257 | 1 | |
| 258 | 1 | |
| 259.4 | 1 |
| Value | Count | Frequency (%) |
| 2593.8 | 1 | |
| 2547.3 | 1 | |
| 2424.2 | 1 | |
| 2378.6 | 1 | |
| 2362.8 | 1 | |
| 2359.6 | 1 | |
| 2327.5 | 1 | |
| 2288.5 | 1 | |
| 2280.4 | 1 | |
| 2279.6 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| date_time | deg_C | relative_humidity | absolute_humidity | sensor_1 | sensor_2 | sensor_3 | sensor_4 | sensor_5 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 8.0 | 41.3 | 0.4375 | 1108.8 | 745.7 | 797.1 | 880.0 | 1273.1 |
| 1 | 2011-01-01 01:00:00 | 5.1 | 51.7 | 0.4564 | 1249.5 | 864.9 | 687.9 | 972.8 | 1714.0 |
| 2 | 2011-01-01 02:00:00 | 5.8 | 51.5 | 0.4689 | 1102.6 | 878.0 | 693.7 | 941.9 | 1300.8 |
| 3 | 2011-01-01 03:00:00 | 5.0 | 52.3 | 0.4693 | 1139.7 | 916.2 | 725.6 | 1011.0 | 1283.0 |
| 4 | 2011-01-01 04:00:00 | 4.5 | 57.5 | 0.4650 | 1022.4 | 838.5 | 871.5 | 967.0 | 1142.3 |
| 5 | 2011-01-01 05:00:00 | 4.5 | 53.7 | 0.4759 | 1004.0 | 745.5 | 914.2 | 989.1 | 973.8 |
| 6 | 2011-01-01 06:00:00 | 3.3 | 54.8 | 0.4636 | 940.9 | 738.2 | 816.0 | 896.8 | 1049.4 |
| 7 | 2011-01-01 07:00:00 | 3.2 | 60.7 | 0.4667 | 954.5 | 713.9 | 834.7 | 935.6 | 956.3 |
| 8 | 2011-01-01 08:00:00 | 2.5 | 65.7 | 0.4721 | 969.9 | 679.1 | 943.8 | 959.3 | 892.0 |
| 9 | 2011-01-01 09:00:00 | 3.9 | 57.8 | 0.4807 | 976.6 | 655.5 | 996.0 | 906.0 | 817.5 |
Last rows
| date_time | deg_C | relative_humidity | absolute_humidity | sensor_1 | sensor_2 | sensor_3 | sensor_4 | sensor_5 | |
|---|---|---|---|---|---|---|---|---|---|
| 2237 | 2011-04-04 05:00:00 | 10.7 | 61.7 | 0.7550 | 941.3 | 549.1 | 1098.5 | 947.5 | 549.1 |
| 2238 | 2011-04-04 06:00:00 | 9.5 | 66.9 | 0.7531 | 989.8 | 686.2 | 805.6 | 1061.3 | 841.6 |
| 2239 | 2011-04-04 07:00:00 | 9.1 | 60.7 | 0.7446 | 1411.7 | 1135.5 | 474.7 | 1584.0 | 1515.3 |
| 2240 | 2011-04-04 08:00:00 | 13.4 | 47.4 | 0.7553 | 1402.6 | 1389.2 | 427.4 | 1652.6 | 1670.9 |
| 2241 | 2011-04-04 09:00:00 | 18.9 | 37.8 | 0.7487 | 1284.0 | 1102.0 | 501.9 | 1347.5 | 1567.2 |
| 2242 | 2011-04-04 10:00:00 | 23.2 | 28.7 | 0.7568 | 1340.3 | 1023.9 | 522.8 | 1374.0 | 1659.8 |
| 2243 | 2011-04-04 11:00:00 | 24.5 | 22.5 | 0.7119 | 1232.8 | 955.1 | 616.1 | 1226.1 | 1269.0 |
| 2244 | 2011-04-04 12:00:00 | 26.6 | 19.0 | 0.6406 | 1187.7 | 1052.4 | 572.8 | 1253.4 | 1081.1 |
| 2245 | 2011-04-04 13:00:00 | 29.1 | 12.7 | 0.5139 | 1053.2 | 1009.0 | 702.0 | 1009.8 | 808.5 |
| 2246 | 2011-04-04 14:00:00 | 27.9 | 13.5 | 0.5028 | 1124.6 | 1078.4 | 608.2 | 1061.3 | 816.0 |